Out[1]:

ITEMS TO RESOLVE:

  • CDF: Distribution or Density?
  • Change $\approx$ to tildes

6.041x - Unit 5: Continuous Random Variables

Notes by Leo Robinovitch


Core Concepts:

Probability Density Functions (PDFs): replace PMFs (Probability Mass Functions) from discrete random variables

  • Discrete Random Variables have PMFs $p_X(x)$ where the following holds true

    • $P(a \leq x \leq b) = \sum\limits_{x:a \leq x \leq b}p_X(x)$
    • $p_X(x) \geq 0$
    • $\sum\limits_{x}p_X(x) = 1$

  • A random variable is continuous if the following holds true for its PDF $f_X(x)$:

    • $P(a \leq x \leq b) = \int_{a}^{b}f_X(x)dx$
    • $f_X(x) \geq 0$
    • $\int_{-\infty}^{\infty}f_X(x)dx = 1$

  • Units of PDF are probability per unit length:

    • $P(a \leq x \leq x + \delta) \approx f_X(a) \bullet \delta \to$ using rectangular approximation

  • Also notable from above: $P(x = a) = 0 \to$ any particular point in a PDF has zero probability (exactly contrary to PMFs)
  • PDFs do not have to be continuous functions (they can have discontinuities)
  • Finally, $P(a \leq x \leq b) = P(a < x < b) \to$ endpoints don't matter because PDFs deal with intervals

Continuous Uniform Random Variable: rather than integers between a and b being possible (discrete case), any real number between a and b is possible

Expectation of Continuous Random Variables: average in a large number of independent repetitions of a probabilistic experiment $$E[X] = \int_{-\infty}^{\infty}xf_X(x)dx$$

  • Holds true as long as $\int_{-\infty}^{\infty}|x|f_X(x)dx < \infty$
  • Still true that expectation/average is "center of gravity" of PDF
  • If $X \geq 0$, then $E[X] \geq 0$
  • If $a \leq X \leq b$, then $a \leq E[X] \leq b$

Expected Value Rule for Uniform Random Variables: $$E[g(X)] = \int_{-\infty}^{\infty}g(x)f_X(x)dx$$

Linearity of Expectations: $$E[aX + b] = aE[X] + b$$

  • Derivation from applying Expected Value Rule to $g(X) = aX + b$ and separating terms (same as with discrete)

Variance of Uniform Random Variables:

  • Recall $var(X) = E[(X - \mu)^2]$ where $\mu = E[X]$
  • Using Expected Value Rule: $$var(X) = \int_{-\infty}^{\infty}(x - \mu)^2f_X(x)dx$$

Standard Deviation of Uniform Random Variables: $$\sigma_X = \sqrt{var(X)}$$

Variance Rules:

  • Still true that $var(aX + b) = a^2var(X)$
  • Useful formula still holds: $var(X) = E[X^2] - (E[X])^2$

Mean and Variance of Uniform Continuous Random Variable:

  • $E[X] = \int_{-\infty}^{\infty}xf_X(x)dx = \int_{a}^{b}x\frac{1}{b-a}dx = \frac{b - a}{2} \to$ same as discrete and intuitive b/c uniform PDF is symmetrical about expectation
  • Variance calculation:
    • $E[X^2] = \int_{a}^{b}x^2\frac{1}{b-a}dx$
    • $var(X) = E[X^2] - (E[X])^2 = \frac{(b - a)^2}{12} \to$ similar but not same as discrete

  • As such, $\sigma = \frac{b - a}{\sqrt{12}}$ for uniform PDF

Exponential Random Variables: single parameter $\lambda > 0 \to$ generally models waiting time until an event occurs

  • Random Variable $X$: some waiting time until event occurs
  • Looks very similar to Geometric Random Variable, but in continuous form
  • Start point is lambda and rate of decay is also lambda
  • Tail Probability $P(X \geq a)$: $$P(X \geq a) = \int_{a}^{\infty}\lambda e^{-\lambda x}dx = e^{-\lambda a}$$
  • Note that this means that if $a = 0$, area under whole PDF is 1 as expected
  • Expected value of Exponential Random Variable: $$E[X] = \int_{0}^{\infty} x \bullet \lambda e^{-\lambda x}dx = \frac{1}{\lambda}$$
  • To calculate Variance of Exponential Random Variable: $$E[X^2] = \int_{0}^{\infty} x^2 \bullet \lambda e^{-\lambda x}dx = \frac{2}{\lambda^2}$$
    $$var(X) = E[X^2] - (E[X])^2 = \frac{1}{\lambda^2}$$

Memorylessness of Exponential Random Variable: again analogous to Geometric Discrete Random Variable

  • Example: lightbulb lifetime $T$ (time to burnout) modeled as exponential random variable with parameter $\lambda$.
    • Recall $P(T > x) = e^{-\lambda x}$ for $x \geq 0$

  • Knowing that certain lightbulb has already been operating for $t$ time units and is still working $\to T > t$
  • How much longer will this lightbulb operate? $\to X$ is remaining lifetime, $X = T - t$
  • Probability that lightbulb lasts another $x$ time units: $P(X > x | T > t)$
    • Using conditional probability definition, can calculate that $P(X > x | T > t) = e^{-\lambda x} \to$ probability that used lightbulb lives another $x$ time units is exactly the same as probability new lightbulb will live another $x$ time units!
    • This is the "memorylessness" of the exponential distribution

  • Notable fact is now that $P(t \leq T \leq t + \delta | T > t) = P(0 \leq T \leq \delta) \approx f_T(0) \bullet \delta = \lambda \delta$
    • For some small time interval $\delta$, if the lightbulb is alive at time $t$, it has probability of burning out in the next $\delta$ amount of time of $\lambda \delta$
      • This is like independently flipping a coin every $\delta$ time step where $P(success) = \lambda \delta$ on each flip. Analogous to geometric RV in that time step $\delta$ in the Exponential discretizes the probability of some event occurring over time.
      • Exponential RV corresponds to total time elapsed (akin to number of $\delta$s) until first success
      • Lays foundation for Poisson process, covered later.

Cumulative Distribution Functions (CDFs): unifying representation of both discrete and continuous random variables $$F_X(x) = P(X \leq x)$$

  • Becuase of additivity property of probabilities, can break CDF into multiple probabilities: $P(X \leq 4) = P(X \leq 3) + P(3 \leq X \leq 4)$
  • For continuous Random Variables: $$F_X(x) = \int_{-\infty}^{x}f_X(t)dt$$
  • As such, value of PDF is derivative of CDF at any differentiable point: $$\frac{dF_X}{dx}(x) = f_X(x)$$
  • For discrete Random Variables: $$F_X(x) = \sum\limits_{k \leq x}p_X(k)$$
    • Takes form of "staircase function" $\to$ size of stairs accord with PDF values

  • General CDF Properties:
    • Non-decreasing $\to$ if $y \geq x$, then $F_X(y) \geq F_X(x)$
    • $F_X(x)$ tends to $1$ as $x \to \infty$
    • $F_X(x)$ tends to $0$ as $x \to -\infty$

Normal or Gaussian Random Variable: key to probability (Central Limit Theorem to be discussed)

  • Common in applications of probability:
    • Nice analytical properties
    • Good model of noise when consists of addition of many small, independent noise terms (common in reality)

Standard Normal Form of Gaussian Random Variable: mean of $0$, variance of $1 \to$ simplest form $$N(0,1): f_X(x) = \frac{1}{\sqrt{2 \pi}} e^{\frac{-x^2}{2}}$$

  • $E[X] = 0$ as $e^{\frac{-x^2}{2}}$ is centered about $x = 0$
  • $var(X) = 1$, proven with integration by parts
  • $\frac{1}{\sqrt{2 \pi}}$ is a normalization term, as $\int_{-\infty}^{\infty}e^{\frac{-x^2}{2}}dx = \sqrt{2 \pi}$

General Form of Gaussian Random Variable: mean of $\mu$, variance of $\sigma^2$ $$N(\mu, \sigma^2): f_X(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{\frac{-(x - \mu)^2}{2 \sigma^2}}$$

  • $E[X] = \mu$ as $e^{\frac{-(x - \mu)^2}{2 \sigma^2}}$ is centered about $x = \mu$
  • $var(X) = \sigma^2$
  • Again, $\frac{1}{\sigma \sqrt{2 \pi}}$ is normalization term to make integral = 1

Linear Functions of Normal Random Variables: normality is preserved when forming linear functions of other normal random variables (proven later)

  • Let $Y = aX + b$ where $X \approx N(\mu, \sigma^2)$
    • $E[Y] = a \mu + b$
    • $var(Y) = a^2 \sigma^2$
    • New concept: $Y \approx N(a \mu + b,\ a^2 \sigma^2) \to Y$ is a Normal Random Variable

  • Special case: if $a = 0$ such that $Y = b \to$ Y is now a Discrete Random Variable
    • Think of Y as still a Continuous Normal Random variable with zero spread: $Y \approx N(b, 0)$

Calculating Probabilities with Normal Random Variables:

  • No closed form Cumulative Distribution Function for Normal RVs $\to$ no way to explicitly solve for $F_Y(y) = P(Y \leq y) = \Phi(y)$ given $Y \approx N(0, 1)$
  • Instead, use tables to determine probability that $Y$ takes value less than or equal to $y \to \Phi(y) = P(Y \leq y)$:
  • NOTE THAT THIS IS FOR STANDARD NORMAL RANDOM VARIABLES (STANDARD NORMAL TABLE)
  • Vertical column is "coarse grain", with horizontal column "fine grain" adjustment for $y$
  • Also note $\Phi(-y) = 1 - \Phi(y)$

Standardizing a Random Variable (Mean Normalization):

  • Let $X$ be some random variable (not necessarily normal) with mean $\mu$ and variance $\sigma^2 > 0$
  • Define $Y = \frac{X - \mu}{\sigma}$
    • $E[Y] = E[X] - \mu = 0$
    • $var(Y) = \frac{1}{\sigma^2}var(X) = 1$

  • If $X \approx N(\mu, \sigma^2)$, then $Y \approx N(0, 1)$
    • This is convenient b/c non-standard normals can be expressed in terms of standard normals: $X = \sigma Y + \mu \to$ X is non-standard, Y is standard
      • Use table for Y to calculate probabilities for X

  • Example: $X \approx N(\mu = 6,\ \sigma^2 = 2^2)$ and $Y = \frac{X - \mu}{\sigma} \to Y \approx N(0, 1)$ \begin{align} P(2 \leq X \leq 8) &= P(\frac{2 - 6}{2} \leq \frac{X - 6}{2} \leq \frac{8 - 6}{2}) \ &= P(-2 \leq Y \leq 1) \ &= \Phi(1) - \Phi(-2) \ &= \Phi(1) - (1 - \Phi(2)) \to use\ table \end{align}

Conditional PDFs:

  • Comparison summary with discrete PMF cases seen previously (Discrete PMF $\to$ Continuous PDF)
  • For Conditional PDF of $X$ given that $X \in A$, just scale PDF by $P(A)$:

Total Probability Theorem and Total Expectation Theorem for Continuous Random Variables (PDFs):

  • Using fact that derivative of CDF is PDF in first written line to go from F $\to$ f:

Mixed Distributions $\mathbf{\to}$ both Discrete and Continuous Random Variables:

  • A mixed random variable is described by continuous and discrete random variables:
  • Mixed Random Variable $X$ does not have a PMF or PDF of its own! Only a CDF can be used to describe it:

Joint Continuous Random Variables and Joint PDFs:

  • Recall that a double integral is finding a volume
  • Two Random Variables are jointly continuous if they can be described by a joint PDF
  • Summary of equations analogous to discrete PMF case:
  • Joint PDF of 2 random variables has units [Probability per Unit Area]:
    • $P(a \leq X \leq a + \delta \cap c \leq Y \leq c + \delta) \approx f_{X,Y}(a,c) \bullet \delta^2$

  • If a set B in which $(x,y) \in B$ has zero area, then $P((x,y) \in B) = 0$
    • Consequence: if $X$ is continuous RV and $Y = X$, then you cannot say $f_{X,Y}(x,y)$ is jointly continuous because it just forms a line $\to$ require more than just two continuous RVs for joint continuity

Joint and Marginal PDFs:

  • Recall previously that knowing the joint PMF we also know the marginal PMFs ($p_X(x)$ and $p_Y(y)$ from $p_{X,Y}(x,y)$)
    • $p_X(x) = \sum\limits_{y}p_{X,Y}(x,y)$
    • $p_Y(y) = \sum\limits_{x}p_{X,Y}(x,y)$

  • Similarly, derivation for marginal PDF:
    • $F_X(x) = P(X \leq x) = \int\limits_{-\infty}^{x} \int\limits_{-\infty}^{\infty} f_{X,Y}(s,t)dt ds$
    • Derive both sides with respect to $x$: $f_X(x) = \frac{df_X(x)}{dx} = \int\limits_{-\infty}^{\infty} f_{X,Y}(x,y)dy$

  • Therefore:
    • $f_X(x) = \int f_{X,Y}(x,y)dy$
    • $f_Y(y) = \int f_{X,Y}(x,y)dx$

  • Uniform Joint PDF example:

More than 2 Random Variables for Joint PDFs:

  • Exactly analogous to PMF discrete case (replace summations with integrals and p's with f's):

Continuous PDF Case $\mathbf{\to}$ Functions of More than 2 RVs, Expected Value Rule, Linearity of Expectations:

Joint CDFs:

  • Previously with single random variable PDFs:
    • $F_X(x) = P(X \leq x) = \int\limits_{-\infty}^{x}f_X(t)dt$
    • $f_X(x) = \frac{dF_X}{dx}(x)$

  • With multi-RV joint CDFs:
    • $F_{X,Y}(x,y) = P(X \leq x \cap Y \leq y) = \int\limits_{-\infty}^{y} \int\limits_{-\infty}^{x} f_{X,Y}(s,t)ds dt$
    • $f_{X,Y}(x,y) = \frac{\partial^2F_{X,Y}}{\partial x \partial y}(x,y)$

Conditional PDFs when Conditioning on Another Random Variable:

  • One slide summary:
  • Note that because of definition, $\ f_{X|Y}(x|y) \geq 0$
  • Think of $f_{X|Y}(x|y)$ as some $f_X(x)$ with a fixed y $\to$ a slice of $f_{X,Y}(x,y)$, scaled to integrate to 1
  • Can extract "multiplication rule" to define conditional joint PDF with probability densities (NOT same as probabilities!): $$f_{X,Y}(x,y) = f_Y(y) \bullet f_{X|Y}(x|y) = f_X(x) \bullet f_{Y|X}(y|x)$$

Total Probability Theorem, Conditional Expectation, and Total Expectation Theorem for Continuous PDFs:

  • Summary slide:
  • Various forms of expected value rule for continuous RVs and PDFs are the same as for discrete, just changing p $\to$ f and sums to integrals, e.g.:
    • $E[g(x)|Y = y] = \int\limits_{-\infty}^{\infty} g(x) f_{X|Y}(x|y)dx$

Independence of Continuous RVs and PDFs:

  • Core concept: knowledge about one or more random variables does not provide any information about the other(s)
  • Analogous to PMF discrete RVs, independent RVs if $f_{X,Y}(x,y) = f_X(x)f_Y(y)$ for all $x, y$
  • Comparing this to multiplication rule: $f_{X,Y}(x,y) = f_Y(y) f_{X|Y}(x|y) = f_X(x) f_{Y|X}(y|x)$ for all $f_Y(y) \geq 0$
    • If $X$ and $Y$ independent, then $f_{X|Y}(x|y) = f_X(x)$ and $f_{Y|X}(y|x) = f_Y(y)$ for all y/x with $f_Y(y) \geq 0$ / $f_X(x) \geq 0$ and all x/y

  • Additionally, similar to discrete case, if $X$ and $Y$ independent:
    • $E[XY] = E[X] \bullet E[Y]$
    • $var(X + Y) = var(X) + var(Y)$
    • If $g(X)$ and $h(Y)$ also independent, $E[g(X)h(Y)] = E[g(X)] \bullet E[h(Y)]$

Independent Normal Random Variables:

  • Important due to noise modeling (often sum of many small independent "noiselets")
  • Given $f_X(x) = \frac{1}{2 \pi} \exp(\frac{-(x)^2}{2})$ as two independent standard normal random variables: $$f_{X,Y}(x,y) = f_X(x) \bullet f_Y(y) = \frac{1}{2 \pi} \exp(\frac{-x^2}{2}) \bullet \frac{1}{2 \pi} \exp(\frac{-y^2}{2}) = \frac{1}{2 \pi} \exp(-\frac{1}{2}(x^2 + y^2))$$
  • Given now two independent general normal random variables: $$f_{X,Y}(x,y) = f_X(x) \bullet f_Y(y) = \frac{1}{2 \pi \sigma_x \sigma_y} \exp(-\frac{(x - \mu_x)^2}{2 \sigma_x^2} - \frac{(y - \mu_y)^2}{2 \sigma_y^2})$$
  • The contours of the negative exponential look as follows (smaller $f_{X,Y}(x,y)$ as contours go outward):
  • Center is determined by the means and variances determine the stretching along each coordinate axis
    • Because of independence, stretching is only along the coordinate axis' (bivariate normal distribution allows stretching in other directions--dependence is present)

Bayes Rule for Continuous Random Variables and PDFs:

  • Prior: $\to f_X(x)$
  • Model of Observations: $\to f_{Y|X}(y|x)$
  • Posterior (Conditional Distribution of X): $\to f_{X|Y}(x|y)$
  • Numerator (Joint PDF by Multiplication Rule): $\to f_{X,Y} = f_X(x)f_{Y|X}(y|x)$
  • Denominator (Marginal PDF from Joint PDF using Total Probability Theorem): $\to f_Y(y) = \int f_X(x') f_{Y|X}(y|x')dx'$
  • Summary slide:

Third Variation on Bayes Rule: One Discrete, One Continuous Random Variable (Mixed RVs):

  • Write out probabilities in various forms with multiplication rule, then put in PDF + PMF form
    • Use Total Probability Theorem to get marginal distributions (denominators)
  • Summary slide:

Bayes Rule Example 1: Discrete Unknown, Continuous Measurement: From above summary slide: $$p_{K|Y}(k|y) = \frac{p_K(k)f_{Y|K}(y|k)}{f_Y(y)}\ \ with\ \ f_Y(y) = \sum\limits_{k'}p_K(k')f_{Y|K}(y|k')$$

  • Unknown $K$ is equally likely to be -1 or 1 $\to p_K(k) = \frac{1}{2}$ for K = -1, K = 1 (prior is easy here)
  • Measurement $Y = K + W$, where $W \approx N(0, 1)$
    • $f_{Y|K = 1} \approx N(1, 1)$
    • $f_{Y|K = -1} \approx N(-1, 1)$
    • As such, model $f_{Y|K}(y|k) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2}(y - k)^2}$

  • Marginal using Total Probability Theorem: $f_Y(y) = \frac{1}{2}\frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2}(y + 1)^2} + \frac{1}{2} \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2}(y - 1)^2}$
  • Now that each term is defined, do a bunch of algebra and get the posterior (probability that $K = 1$ given some $Y = y$): $$p_{K|Y}(1|y) = \frac{1}{1 + e^{-2y}}$$
    • Note the sigmoid function, going to $P \to 1$ as $y \to \infty$ and $P \to 0$ as $y \to -\infty$

Bayes Rule Example 2: Continuous Unknown, Discrete Measurement: From above summary slide: $$f_{Y|K}(y|k) = \frac{f_Y(y)p_{K|Y}(k|y)}{p_K(k)}\ \ with\ \ p_K(k) = \int f_Y(y')p_{K|Y}(k|y')dy'$$

  • Measurement $K$: Bernoulli Random Variable with parameter $Y \to P(K = 1) = Y$ and $P(K = 0) = 1 - Y$
  • Unknown $Y$: model as uniform on $[0,1] \to f_Y(y) = 1$ for $y \in [0,1]$ and $0$ otherwise
  • Find the distribution of $Y$ given $K = 1$
  • Model: $p_{K|Y}(1|y) = P(K = 1|Y = y) = y$
  • Marginal using Total Probability Theorem: $p_K(1) = \int\limits_{0}^{1}1 \bullet y dy = \frac{1}{2}$
  • Now, posterior is calculated (new PDF of $Y$ given $K = 1$, e.g. bias of a coin based on a single coin flip): $$f_{Y|K}(y|1) = 2y,\ \ y \in [0,1]$$

Lecture 8, 9, and 10 Exercises:

Currently unavailable in archived course.


Solved Problems:

#1: Question 1:


Problem Set 5:

See 2010 Problem Set 5.

[NbConvertApp] Converting notebook Unit_05.ipynb to html
[NbConvertApp] Writing 4398773 bytes to Unit_05.html